智能论文笔记

Merged-GHCIDR: Geometrical Approach to Reduce Image Data

Devvrat Joshi , Janvi Thakkar , Siddharth Soni , Shril Mody , Rohan Patil , Nipun Batra

分类：机器学习

2022-09-06

自从深网的成立以来，训练模型所需的计算资源一直在增加。大规模数据集中的培训神经网络已成为一项具有挑战性且耗时的任务。因此，需要减少数据集而不损害准确性。在本文中，我们介绍了一种早期方法，即通过均匀聚类来减少数据集大小的新颖方法。所提出的方法基于将数据集划分为均匀簇的想法，并选择对准确性产生显着贡献的图像。我们提出了两种变体：用于图像数据降低的几何均匀聚类（GHCIDR）和合并GHCIDR在基线算法 - 通过均匀聚类（RHC）降低（RHC），以实现更好的准确性和训练时间。 GHCIDR背后的直觉涉及通过簇权重和训练集的几何分布选择数据点。合并GHCIDR涉及使用完整的链接聚类的群集合并相同的标签。我们使用了三个深度学习模型 - 完全连接的网络（FCN），VGG1和VGG16。我们在四个数据集中进行了两个变体 - MNIST，CIFAR10，Fashion-Mnist和Tiny-Imagenet。与RHC相同百分比的合并GHCIDR在MNIST，Fashion-Mnist，CIFAR10和Tiny-Imagenet上分别增加了2.8％，8.9％，7.6％和3.5％。

translated by 谷歌翻译

Geometrical Homogeneous Clustering for Image Data Reduction

Shril Mody , Janvi Thakkar , Devvrat Joshi , Siddharth Soni , Rohan Patil , Nipun Batra

分类：机器学习

2022-08-27

在本文中，我们介绍了一种早期方法的新颖变化，称为均质聚类算法，用于降低数据集大小。本文提出的方法背后的直觉是将数据集划分为均匀簇，并选择一些对准确性产生重大贡献的图像。选定的图像是训练数据的正确子集，因此是可读的。我们在基线算法RHC上提出了四个变体。第一种方法背后的直觉是，边界点有助于簇的代表。它涉及选择群集质心的最远的k和一个最近的邻居。在以下两种方法（KONCW和CWKC）中，我们介绍了簇权重的概念。它们是基于这样一个事实，即较大的簇贡献比较小的群集的贡献更多。最终变化是GHCIDR，它根据数据分布的几何方面选择点。我们在两个深度学习模型 - 完全连接的网络（FCN）和VGG1上进行了实验。我们在三个数据集中的四个变体中进行了实验：MNIST，CIFAR10和Fashion-Mnist。我们发现，GHCIDR的最佳准确度分别为99.35％，81.10％和91.66％，培训数据降低了87.27％，32.34％和76.80％，分别为MNIST，CIFAR10和时尚。

translated by 谷歌翻译

HTML版本

Weakly-Supervised Semantic Segmentation of Ships Using Thermal Imagery

Rushil Joshi , Ethan Adams , Matthew Ziemann , Christopher A. Metzler

分类：计算机视觉

2022-12-26

The United States coastline spans 95,471 miles; a distance that cannot be effectively patrolled or secured by manual human effort alone. Unmanned Aerial Vehicles (UAVs) equipped with infrared cameras and deep-learning based algorithms represent a more efficient alternative for identifying and segmenting objects of interest - namely, ships. However, standard approaches to training these algorithms require large-scale datasets of densely labeled infrared maritime images. Such datasets are not publicly available and manually annotating every pixel in a large-scale dataset would have an extreme labor cost. In this work we demonstrate that, in the context of segmenting ships in infrared imagery, weakly-supervising an algorithm with sparsely labeled data can drastically reduce data labeling costs with minimal impact on system performance. We apply weakly-supervised learning to an unlabeled dataset of 7055 infrared images sourced from the Naval Air Warfare Center Aircraft Division (NAWCAD). We find that by sparsely labeling only 32 points per image, weakly-supervised segmentation models can still effectively detect and segment ships, with a Jaccard score of up to 0.756.

translated by 谷歌翻译

Cross-Domain Consumer Review Analysis

Aditya Pandey , Kunal Joshi

分类：机器学习

2022-12-23

The paper presents a cross-domain review analysis on four popular review datasets: Amazon, Yelp, Steam, IMDb. The analysis is performed using Hadoop and Spark, which allows for efficient and scalable processing of large datasets. By examining close to 12 million reviews from these four online forums, we hope to uncover interesting trends in sales and customer sentiment over the years. Our analysis will include a study of the number of reviews and their distribution over time, as well as an examination of the relationship between various review attributes such as upvotes, creation time, rating, and sentiment. By comparing the reviews across different domains, we hope to gain insight into the factors that drive customer satisfaction and engagement in different product categories.

translated by 谷歌翻译

DePlot: One-shot visual language reasoning by plot-to-table translation

Fangyu Liu , Julian Martin Eisenschlos , Francesco Piccinno , Syrine Krichene , Chenxi Pang , Kenton Lee , Mandar Joshi , Wenhu Chen , Nigel Collier , Yasemin Altun

分类：自然语言处理 | 人工智能 | 计算机视觉

2022-12-20

Visual language such as charts and plots is ubiquitous in the human world. Comprehending plots and charts requires strong reasoning skills. Prior state-of-the-art (SOTA) models require at least tens of thousands of training examples and their reasoning capabilities are still much limited, especially on complex human-written queries. This paper presents the first one-shot solution to visual language reasoning. We decompose the challenge of visual language reasoning into two steps: (1) plot-to-text translation, and (2) reasoning over the translated text. The key in this method is a modality conversion module, named as DePlot, which translates the image of a plot or chart to a linearized table. The output of DePlot can then be directly used to prompt a pretrained large language model (LLM), exploiting the few-shot reasoning capabilities of LLMs. To obtain DePlot, we standardize the plot-to-table task by establishing unified task formats and metrics, and train DePlot end-to-end on this task. DePlot can then be used off-the-shelf together with LLMs in a plug-and-play fashion. Compared with a SOTA model finetuned on more than >28k data points, DePlot+LLM with just one-shot prompting achieves a 24.0% improvement over finetuned SOTA on human-written queries from the task of chart QA.

translated by 谷歌翻译

A Twitter BERT Approach for Offensive Language Detection in Marathi

Tanmay Chavan , Shantanu Patankar , Aditya Kane , Omkar Gokhale , Raviraj Joshi

分类：自然语言处理

2022-12-20

Automated offensive language detection is essential in combating the spread of hate speech, particularly in social media. This paper describes our work on Offensive Language Identification in low resource Indic language Marathi. The problem is formulated as a text classification task to identify a tweet as offensive or non-offensive. We evaluate different mono-lingual and multi-lingual BERT models on this classification task, focusing on BERT models pre-trained with social media datasets. We compare the performance of MuRIL, MahaTweetBERT, MahaTweetBERT-Hateful, and MahaBERT on the HASOC 2022 test set. We also explore external data augmentation from other existing Marathi hate speech corpus HASOC 2021 and L3Cube-MahaHate. The MahaTweetBERT, a BERT model, pre-trained on Marathi tweets when fine-tuned on the combined dataset (HASOC 2021 + HASOC 2022 + MahaHate), outperforms all models with an F1 score of 98.43 on the HASOC 2022 test set. With this, we also provide a new state-of-the-art result on HASOC 2022 / MOLD v2 test set.

translated by 谷歌翻译

KNIFE: Knowledge Distillation with Free-Text Rationales

Aaron Chan , Zhiyuan Zeng , Wyatt Lake , Brihi Joshi , Hanjie Chen , Xiang Ren

分类：自然语言处理 | 人工智能 | 机器学习

2022-12-19

Free-text rationales (FTRs) follow how humans communicate by explaining reasoning processes via natural language. A number of recent works have studied how to improve language model (LM) generalization by using FTRs to teach LMs the correct reasoning processes behind correct task outputs. These prior works aim to learn from FTRs by appending them to the LM input or target output, but this may introduce an input distribution shift or conflict with the task objective, respectively. We propose KNIFE, which distills FTR knowledge from an FTR-augmented teacher LM (takes both task input and FTR) to a student LM (takes only task input), which is used for inference. Crucially, the teacher LM's forward computation has a bottleneck stage in which all of its FTR states are masked out, which pushes knowledge from the FTR states into the task input/output states. Then, FTR knowledge is distilled to the student LM by training its task input/output states to align with the teacher LM's. On two question answering datasets, we show that KNIFE significantly outperforms existing FTR learning methods, in both fully-supervised and low-resource settings.

translated by 谷歌翻译

MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering

Fangyu Liu , Francesco Piccinno , Syrine Krichene , Chenxi Pang , Kenton Lee , Mandar Joshi , Yasemin Altun , Nigel Collier , Julian Martin Eisenschlos

分类：自然语言处理 | 人工智能 | 计算机视觉

2022-12-19

Visual language data such as plots, charts, and infographics are ubiquitous in the human world. However, state-of-the-art vision-language models do not perform well on these data. We propose MatCha (Math reasoning and Chart derendering pretraining) to enhance visual language models' capabilities in jointly modeling charts/plots and language data. Specifically, we propose several pretraining tasks that cover plot deconstruction and numerical reasoning which are the key capabilities in visual language modeling. We perform the MatCha pretraining starting from Pix2Struct, a recently proposed image-to-text visual language model. On standard benchmarks such as PlotQA and ChartQA, the MatCha model outperforms state-of-the-art methods by as much as nearly 20%. We also examine how well MatCha pretraining transfers to domains such as screenshots, textbook diagrams, and document figures and observe overall improvement, verifying the usefulness of MatCha pretraining on broader visual language tasks.

translated by 谷歌翻译

Biomedical image analysis competitions: The state of current participation practice

Matthias Eisenmann , Annika Reinke , Vivienn Weru , Minu Dietlinde Tizabi , Fabian Isensee , Tim J. Adler , Patrick Godau , Veronika Cheplygina , Michal Kozubek , Sharib Ali

分类：计算机视觉 | 机器学习

2022-12-16

The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.

translated by 谷歌翻译

Toward Improved Generalization: Meta Transfer of Self-supervised Knowledge on Graphs

Wenhui Cui , Haleh Akrami , Anand A. Joshi , Richard M. Leahy

分类：机器学习

2022-12-16

Despite the remarkable success achieved by graph convolutional networks for functional brain activity analysis, the heterogeneity of functional patterns and the scarcity of imaging data still pose challenges in many tasks. Transferring knowledge from a source domain with abundant training data to a target domain is effective for improving representation learning on scarce training data. However, traditional transfer learning methods often fail to generalize the pre-trained knowledge to the target task due to domain discrepancy. Self-supervised learning on graphs can increase the generalizability of graph features since self-supervision concentrates on inherent graph properties that are not limited to a particular supervised task. We propose a novel knowledge transfer strategy by integrating meta-learning with self-supervised learning to deal with the heterogeneity and scarcity of fMRI data. Specifically, we perform a self-supervised task on the source domain and apply meta-learning, which strongly improves the generalizability of the model using the bi-level optimization, to transfer the self-supervised knowledge to the target domain. Through experiments on a neurological disorder classification task, we demonstrate that the proposed strategy significantly improves target task performance by increasing the generalizability and transferability of graph-based knowledge.

translated by 谷歌翻译